-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Add support for Lookup Join on Multiple Fields #131559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @julian-elastic, I've created a changelog YAML for you. |
a66e89b
to
d9462cc
Compare
b3ac3af
to
ea171b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass, reviewed the whole non-productive code now. Looks good so far, although I have some ideas for more tests.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/multi_column_joinable.csv
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/multi_column_joinable_lookup.csv
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/lookup-join.csv-spec
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I wonder what happens when users ask us to join on an unreasonable amount of fields:
| LOOKUP JOIN lu_idx ON field1, field2, ..., field999
We should circuit break in this case, but there is a chance we might OOM if we messed up our memory accounting somewhere.
We could add a test to HeapAttackIT in a follow-up PR, if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of join on fields is limited by the number of fields in the lookup index, so it cannot be unlimited. I have some code in x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java where we throw an exception if you try to join twice on the same field.
"JOIN ON clause does not support multiple fields with the same name, found multiple instances of [{}]", so you cannot repeat them either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can create indices with many, many, many fields. The ECS schema has north of 1k fields.
With malicious intent, a user could try to bring down a cluster by issuing a join that joins on all the lookup index' fields. I think it'd be worth it to test this for good measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can certainly make a HeapAttackIT
where we try to crash the world and catch the circuit breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 2 cases to HeapAttackIT.
One is a working case with 990 join on fields. If I try to have more than 1000 we get an [sensor_data] java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded
The other one is a circuit breaker case with multiple fields, but they cannot be too many as there is a lot of data to insert due to all the fields, and then the insert takes a long time and the test can time out.
...n/java/org/elasticsearch/xpack/esql/qa/rest/generative/command/pipe/LookupJoinGenerator.java
Outdated
Show resolved
Hide resolved
...n/java/org/elasticsearch/xpack/esql/qa/rest/generative/command/pipe/LookupJoinGenerator.java
Outdated
Show resolved
Hide resolved
...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
Show resolved
Hide resolved
x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/esql/190_lookup_join.yml
Outdated
Show resolved
Hide resolved
eee603f
to
ed6946b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I now got everything except some of the compute engine-only code. Looks good, but I think we can clean up the changed (transport) request a little more as it sends redundant data now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to declare this change as notable. I'm not sure what the bar is for that @leemthompo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you and @tylerperk agree sounds like this passes the notable bar 😄 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leemthompo How do I declare this change as notable? Can you point me to an example? Or you add it to some other release notes list after it is merged?
...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperatorTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on my side. Left a few small things
...sql/compute/src/main/java/org/elasticsearch/compute/operator/lookup/ExpressionQueryList.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can certainly make a HeapAttackIT
where we try to crash the world and catch the circuit breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good choice.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexService.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/LookupFromIndexOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs updates LGTM, thanks Julian
Add support for Lookup Join on Multiple Fields
Removed some checks to allow lookup join on multiple fields.
Added a new interface LookupEnrichQueryGenerator, that can be used to get total number of queries and queries by position. The rest of the methods from QueryGenerator are not needed by AbstractLookupService.
That allowed the creation of a new class ExpressionQueryList implements LookupEnrichQueryGenerator, which is responsible for creating the AND query for the different fields. We will likely need to enhance it in the future to support expressions that include OR and NOT as well.
TransportRequest is enhanced to now support
List<MatchConfig>
matchFields instead ofString matchField
. This is how we pass the match fields around now. If we are communicating with an cluster that does not support LookupOnMultipleFields and it is needed by the query we will fail the query. This can happen during rolling upgrade or CCS.